--- name: hypogenic description: Automated LLM-driven hypothesis generation and testing for tabular datasets; use when you need systematic exploration of empirical patterns (e.g., fraud detection, content analysis) and want to combine literature insights with data-driven hypothesis evaluation. license: MIT author: aipoch --- > **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills) ## When to Use - **Exploratory analysis on a new dataset** where you want the model to propose multiple *testable* hypotheses from observed patterns (e.g., AI-generated text detection). - **Benchmarking competing explanations** by generating a hypothesis bank and evaluating them consistently on validation/test splits. - **Literature-informed research** where you want to extract claims from papers and refine them against real data (e.g., deception cues in reviews). - **High-coverage hypothesis discovery** when you need both theory-driven and data-driven hypotheses, then merge/deduplicate them (Union workflows). - **Hypothesis-driven classification/regression pipelines** for domains like fraud detection, content moderation, mental health indicators, or other empirical studies using tabular/JSON datasets. ## Key Features - **Automated hypothesis generation (HypoGeniC)**: iteratively proposes and improves hypotheses using dataset feedback. - **Literature + data integration (HypoRefine)**: extracts literature insights from PDFs and refines hypotheses jointly with empirical signals. - **Union method**: mechanically merges literature-only hypotheses with HypoGeniC/HypoRefine outputs to maximize coverage and reduce redundancy. - **Config-driven prompting**: YAML templates with variable injection (e.g., `${text_features_1}`, `${num_hypotheses}`) for generation and inference. - **Scalable experimentation**: optional Redis caching, parallelism, and adaptive selection focusing on hard examples. ## Dependencies - `hypogenic` (install via PyPI; version depends on your environment) - Optional (recommended for cost/performance): - `redis` (server; used for caching repeated LLM calls) - Optional (required for literature/PDF workflows such as HypoRefine): - `GROBID` (service; used for PDF preprocessing) - `s2orc-doc2json` (PDF-to-structured conversion used in literature pipelines) Install: ```bash uv pip install hypogenic ``` ## Example Usage The following example is a minimal end-to-end workflow (dataset + config + CLI + Python). Adjust paths and prompts for your task. ### 1) Prepare a dataset (HuggingFace-style JSON) Create three files: - `./data/my_task_train.json` - `./data/my_task_val.json` - `./data/my_task_test.json` Example schema (feature keys can be renamed, but must match your config placeholders): ```json { "text_features_1": ["Text A1", "Text A2"], "text_features_2": ["Text B1", "Text B2"], "label": ["Class1", "Class2"] } ``` ### 2) Create `./data/my_task/config.yaml` ```yaml task_name: my_task train_data_path: ./data/my_task_train.json val_data_path: ./data/my_task_val.json test_data_path: ./data/my_task_test.json prompt_templates: observations: | Feature 1: ${text_features_1} Feature 2: ${text_features_2} Label: ${label} batched_generation: system: | You are a scientific assistant. Propose testable, falsifiable hypotheses that map features to labels. user: | Given examples and labels, generate ${num_hypotheses} distinct hypotheses. Return a JSON list of hypotheses, each with a short name and a testable statement. inference: system: | You are a careful classifier. Use the provided hypothesis to predict the label. user: | Hypothesis: ${hypothesis} Feature 1: ${text_features_1} Feature 2: ${text_features_2} Output the final answer as: "final answer: